| 1 | 2 | 3 | |
|---|---|---|---|
| n | 142 | 142 | 142 |
| mean.x | 54.3 | 54.3 | 54.3 |
| sd.x | 16.8 | 16.8 | 16.8 |
| mean.y | 47.8 | 47.8 | 47.8 |
| sd.y | 26.9 | 26.9 | 26.9 |
| cor.xy | -0.07 | -0.06 | -0.07 |
PH345: Winter 2025
American political scientist, statistician, and professor emeritus at Yale University
‘Godfather’ of data visualization and visual presentation of information
Author of Visual Display of Quantitative Information (2001)
Photo by Keegan Peterzell - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=40367115
Three datasets:
Dataset
|
|||
|---|---|---|---|
| 1 | 2 | 3 | |
| n | 142 | 142 | 142 |
| mean.x | 54.3 | 54.3 | 54.3 |
| sd.x | 16.8 | 16.8 | 16.8 |
| mean.y | 47.8 | 47.8 | 47.8 |
| sd.y | 26.9 | 26.9 | 26.9 |
| cor.xy | -0.07 | -0.06 | -0.07 |
Show the data
All datasets have the nearly equal summary statistics:
| Dataset | Intercept | Slope |
|---|---|---|
| away | 53.43 | -0.10 |
| bullseye | 53.81 | -0.11 |
| circle | 53.80 | -0.11 |
| dino | 53.45 | -0.10 |
| dots | 53.10 | -0.10 |
| h_lines | 53.21 | -0.10 |
| high_lines | 53.81 | -0.11 |
| slant_down | 53.85 | -0.11 |
| slant_up | 53.81 | -0.11 |
| star | 53.33 | -0.10 |
| v_lines | 53.89 | -0.11 |
| wide_lines | 53.63 | -0.11 |
| x_shape | 53.55 | -0.11 |
Steps:
Set of paired numbers \((x_i, y_i)\) where \(i\) indexes pairs, e.g. \((x_1, y_1)\) is first pair, \((x_2, y_2)\) is second pair, etc.
Place points on a cartesian coordinate system. Labeling of points reflects assumption that \(x_i\) goes on the x-axis, \(y_i\) goes on y-axis
Lung-cancer deaths per million in 1950 (\(y\)) against annual per-capita cigarette consumption in 1930 (\(x\)) for 11 countries.
So don’t create a scatterplot if you don’t want to imply a relationship.
Two main types of scatterplots:
\(x\) and \(y\) are both uncontrolled. Goal is to show whether they are co-varying
\(x\) is controlled or “independent” variable, e.g. time, age, dose, or an experimentally controlled variable.
When he wasn’t blackmailing lords and being sued for libel, William Playfair invented the pie chart, the bar graph, and the line graph
Cara Giamo, 2016
Never at any former time was wheat so cheap, in proportional to mechanical labor, as it is in the present time (Playfair)
Figure 3, https://onlinelibrary.wiley.com/doi/epdf/10.1002/jhbs.20078; Originally from Tufte, p34
Direct scatterplot of wheat price and wage, connected by consecutive years
Now very easy to see Playfair’s claim about inflation-adjusted price of wheat. Statistical graphics should reveal data (Tufte, p13)
Reveal the data at several levels of detail
Aesthetics are quantitative mappings of data to visual properties:
Land and ocean anomalies from 1850 to 2024 with respect to the 1901-2000 average
Separate data for northern and southern hemispheres
Average temperature anomalies in the northern hemisphere over time
Emphasis on interyear variability
Emphasis on trend
Emphasis on positive vs negative deviation
Emphasis on positive vs negative deviation, also on time spent above or below
Encourage comparison between data
[Still need]
Doll, R., 1955. Etiology of lung cancer. In Advances in cancer research (Vol. 3, pp. 1-50).
Friendly, M. and Denis, D., 2005. The early origins and development of the scatterplot. Journal of the History of the Behavioral Sciences, 41(2), pp.103-130.
Tufte, E.R., 2001. The visual display of quantitative information.